Arrangement and method for the recording and display of images of a scene and/or an object

ABSTRACT

The invention relates to an arrangement and a method for capturing and displaying images of a scene and/or an object. Said arrangement and method are particularly suited to display the captured images in a three-dimensionally perceptible manner. The aim of the invention is to create a new possibility to take images of real scenes and/or objects with as little effort as possible and then autostereoscopically display the same in a three-dimensional fashion from two or more perspectives. Said aim is achieved by providing at least one main camera of a first camera type for recording images, at least one satellite camera of a second camera type for recording images, an image converting device which is mounted behind the cameras, and a 3D image display device, the two camera types differing from each other in at least one parameter. A total of at least three cameras is provided. Also disclosed is a method for transmitting 3D data.

CROSS-REFERENCE TO RELATED APPLICATIONS

This is a national phase application of International Application No.PCT/DE2007/001965, filed Oct. 29, 2007, which claims priority of GermanApplication No. 10 2006 055 641.0, filed Nov. 22, 2006, the completedisclosures of which are hereby incorporated by reference.

BACKGROUND OF THE INVENTION

a) Field of the Invention

The invention relates to an arrangement and a method for the recordingand display of images of a scene and/or an object, suitable especiallyfor the display of the recorded images for spatial perception. Theinvention further relates to a method of transmitting images for spatialperception.

b) Description of the Related Art

At present there exist essentially three basically different methods,and the appertaining arrangements, for recording 3D image information:

(1) The classical stereo camera, consisting of two like cameras, oneeach for a left and right image. For a highly resolved display,high-resolving camera systems are required, though. With multichannelsystems, interpolation of the intermediate views is required. Thiscauses artefacts to be visible especially in the middle views.

(2) The use of a multiview camera system. Its advantage over the stereocamera is the correct image reproduction with multichannel systems. Inparticular, no interpolations are required. A disadvantage is the greateffort needed to implement exact alignment of the—e.g., eight—cameraswith each other. Another disadvantage is the increased cost involved inthe use of several cameras, which in addition entail further problemssuch as differing white levels, tonal values and/or geometrycharacteristics, which have to be balanced accordingly. The high datarate to be handled with this method is also a drawback.

(3) The use of a depth camera. Here, use is made of a color camerajointly with a depth sensor, which registers the—as a rule,cyclopean—depth information of the scene to be recorded. Apart from thefact that depth sensors are relatively expensive, their drawback is thatthey often do not work very exactly, and/or that no acceptablecompromise between accuracy and speed is achieved. The method requires ageneral extrapolation, in which artefacts, especially in the outerviews, cannot be excluded and, generally, occluding artefacts cannot becovered up.

OBJECT AND SUMMARY OF THE INVENTION

The invention is based on the problem of setting forth a new way ofrecording real scenes and/or objects with the least possible effort andsubsequently displaying them three-dimensionally in two or more viewsfor spatial perception. Another problem of the invention is to find asuitable method for transmitting images for spatial perception.

According to the invention, the problem is solved with an arrangementfor recording images of a scene and/or an object and displaying them forspatial perception, this arrangement to comprise the followingcomponents:

at least one main camera of a first camera type for the recording ofimages;

at least two satellite cameras of a second camera type for the recordingof images, with the first and second camera types differing by at leastone parameter;

an image conversion device arranged downstream of the cameras, forreceiving and processing the initial image data, this image conversiondevice performing, among other processes, a depth or disparityrecognition, for which only those images are employed that were recordedby cameras of the same camera type (preferably those that were recordedby the at least two satellite cameras), but not the residual images; and

a 3D image display device connected to the image conversion device,which displays the image data for spatial perception without specialviewing aids, the 3D image display device displaying at least two viewsof the scene and/or object.

However, the 3D image display device may also display 3, 4, 5, 6, 7, 8,9 or even more views simultaneously or at an average time. Especially inimage display devices of the last-named, so called “multi-view” 3D typewith 4 or more views displayed, the special advantages of the inventiontake effect, viz. that it is possible, with relatively few (e.g. threeor four) cameras, to provide more views than the number of camerasemployed.

The main and the satellite camera generally, but not imperatively,differ by their quality. Mostly, the main camera is a high-qualitycamera, whereas the satellite cameras employed may be of lesser quality(e.g., industrial cameras) and thus mostly, but not imperatively, have alower resolution, among other parameters. In this case, then, the secondcamera type has a lower resolution than the first one. The two cameratypes may also differ (at least) by the built-in imaging chip.

Essentially, the advantage of the invention is that, besides theclassical stereo camera system, here consisting essentially of twoidentical high-resolution cameras, a three-camera system is used,preferably consisting of a central high-quality camera and twoadditional cameras of lower resolution, arranged to the left and right,respectively, of the main camera. Thus, the main camera is arrangedbetween the satellite cameras, for example.

Preferably then, the main camera is arranged between the satellitecameras. The distances between the cameras and their alignment (eitherin parallel or pointed at a common focus) are variable within customarylimits. The use of further satellite cameras may be of advantage, asthis enables a further reduction of misinterpretations especially duringthe subsequent processing of the image data.

According to the embodiment of the invention, it may thus be ofadvantage that

exactly one main camera and two satellite cameras are provided (“version1+2”);

exactly one main camera and three satellite cameras are provided(“version 1+3”); or

exactly one main camera and five satellite cameras are provided(“version 1+5”).

The general idea of the invention obviously also includes otherembodiments, e.g., the use of several main cameras or of still moresatellite cameras.

All cameras may be arranged in parallel or pointed at a common focus. Itis also possible that not all of them are pointed at a common focus(convergence angle). The optical axes of the cameras may lie in oneplane or in different planes, with the center points of the objectivespreferably arranged in line or on a (preferably isosceles orequilateral) triangle. For special cases of application, the centerpoints of the cameras' objectives may also be spaced at unequaldistances relative to each other (with the objective center pointsforming a scalene triangle). It is further possible that all (at leastthree) cameras (i.e. all existing main and satellite cameras) differ byat least one parameter, e.g. by their resolution. The cameras can besynchronized with regard to zoom, f-stop, focus etc. as well as withregard to the individual frames (i.e. best possible true-to-framesynchronization in recording). The cameras may be fixed at permanentlocations or movable relative to each other; the setting of both thebase distance between the cameras and the convergence angles may beautomatic.

It may be of advantage to provide adapter systems that facilitate fixingespecially the satellite cameras to the main camera. In this way,ordinary cameras can be subsequently converted into a 3D camera. It isalso feasible, though, to convert an existing stereo camera system intoa 3D camera conforming to the invention by retrofitting an added maincamera.

Furthermore, the beam path—preferably in front of the objectives of thevarious cameras—can be provided with additional optical elements, e.g.one or several semitransparent mirrors. This makes it possible, e.g., toarrange each of two satellite cameras rotated 90 degrees relative to themain camera, so that the camera bodies of all three cameras are arrangedin such a way that their objective center points are closer togetherhorizontally than they would be if all three cameras were arrangedimmediately side by side, in which case the dimension of the camerabodies would necessitate a certain, greater spacing of the objectivecenter points. In the constellation with the two satellite camerasrotated 90 degrees, a semitransparent mirror in reflection position,arranged at an angle of about 45 degrees relative to the principal raysemerging from the objectives of the satellite cameras, would follow,whereas the same mirror follows in transmission position, arranged at anangle of also 45 degrees relative to the principal ray emerging from theobjective of the main camera.

Preferably, the objective center points of the main camera and of atleast two satellite cameras form an isosceles triangle, e.g., in version“1+2”.

For version “1+3” it may be of advantage that the objective centerpoints of the three satellite cameras form a triangle, preferably anisosceles one. In this case, the objective center point of the maincamera should be arranged within the said triangle, the triangle beingassumed to include its sides.

Moreover, in version “1+3” it is possible that one satellite camera andthe main camera are optically arranged relative to each other in such away that both record an image on essentially the same optical axis, withpreferably at least one semitransparent mirror being arranged betweenthe two cameras.

In this case, the two other satellite cameras are preferably arranged soas to form a straight line or a triangle with the satellite camera thatis associated with the main camera.

Other embodiments can also be implemented, such as, e.g., a version“1+4” with a quadrangle (e.g. square) of 4 satellite cameras with a maincamera inside the quadrangle (e.g., at the center of its area), or evena version “1+n” with a circle of n=5 or more satellite cameras.

Advantageously, the image conversion device generates at least threeviews of the recorded scene or object, employing for such generation ofat least three views, in addition to the depth or disparity dataregistered, the image recorded by the at least one main camera and atleast two more images recorded by the satellite cameras, though notnecessarily by all cameras provided. It is quite possible that one ofthe at least three views generated still is equal to one of the inputimages. In the simplest case, the image conversion device may even useonly the image recorded by the at least one main camera and theassociated depth information for generating the views.

The main camera (or all main cameras) and all satellite cameraspreferably record with frame-accurate synchronization, at a tolerance ofmaximally 100 frames per 24 hours.

For special embodiments it may also be useful to use black-and-whitecameras as satellite cameras, and subsequently automatically assign atonal value preferably to the images produced by them.

The problem is also solved by a method for the recording and display ofimages of a scene and/or an object, comprising the following steps:

Creation of at least an n-tuple of images, with n>2, with at least twoimages having different resolutions;

Transfer of the image data to an image conversion device, in whichsubsequently a rectification, a color adjustment, a depth or disparityrecognition and subsequent generation of further views from n or lessthan n images of said the n-tuple and the depth or disparity recognitionvalues are carried out, with at least one view being generated that isnot exactly equal to any of the images of the n-tuple created, and withthe image conversion device employing, for depth or disparityrecognition, only such images of the n-tuple that have the sameresolution;

Subsequent creation of a combination of at least three different viewsor images in accordance with the parameter assignment of the 3D displayof a 3D image display device for spatial presentation without specialviewing aids; and finally

Presentation of the combined 3D image on the 3D display.

The depth or disparity recognition employs the images with that equalresolution having the lowest total number of pixels compared to allother resolutions provided.

The depth recognition and subsequent generation of further views fromthe n-tuple of images and the depth or disparity recognition data can becarried out, for example, by creating a stack structure and projectingit onto a desired view.

The creation of a stack structure may be replaced by other applicabledepth or disparity recognition algorithms, with the depth or disparityvalues recognized being used for the creation of desired views.

A stack structure may, in general, correspond to a layer structure ofgraphical elements in different (virtual) planes.

If a 3D camera system consisting of cameras of different types withdifferent image resolutions is used, it is possible first to carry out asize adaptation after transfer of the image data to the image conversiondevice. The result of this are images that all have the same resolution.This may correspond to the highest resolution of the cameras, butpreferably it is equal to that of the lowest-resolution camera(s).Subsequently, the camera images are rectified, i.e. their geometricdistortions are corrected (compensation of lens distortions,misalignment of cameras, zoom differences, etc., if any). The sizeadaptation may also be performed within the rectifying process.Immediately after, a color adjustment, is carried out, e.g. as taught bythe publications “Joshi, N. Color Calibration for Arrays of InexpensiveImage Sensors. Technical Report CSTR 2004-02 Mar. 31, 2004 Apr. 4, 2004,Stanford University, 2004” and A. LLie and G. Welch. “Ensuring colorconsistency across multiple cameras”, ICCV 2005. In particular, thetonal/brightness values of the camera images are matched, so that theyare at an equal or at least comparable level. For the image data thusprovided, the stack structure for depth recognition is established. Inthis process, the input images (only the images of the n-tuple havingthe same resolution), stacked on top of each other in the first step,are compared with each other line by line. The line-by-line comparisoncan possibly be made in an oblique direction rather; this will befavorable if the cameras are not arranged in a horizontal plane. Ifpixels lying on top of each other have the same tonal value, this willbe saved; if they have different tonal values, none of these will besaved. Thereafter, the lines are displaced relative to each other bydefined steps (e.g., by ¼ or ½ pixel) in opposite directions; afterevery step the result of the comparison is saved again. At the end ofthis process, the three-dimensional stack structure with the coordinatesX, Y and Z is obtained, with X and Y corresponding to the pixelcoordinates of the input image, whereas Z represents the extent ofrelative displacement between the views. Thus, if two or three camerasare used, always two or three lines, respectively, are compared anddisplaced relative to each other. It is also possible to use more thantwo, e.g., three cameras and still combine always two lines only, inwhich case the comparisons have to be matched once more. If three ormore lines are compared, there are far fewer ambiguities than with thecomparison of the two lines of two input images only. In the subsequentoptimization of the stack structure, the task essentially consists indeleting the least probable combinations in case of ambiguousrepresentations of image elements in the stack. In addition, thiscontributes to data reduction. Further reduction is achieved if a heightprofile curve is derived from the remaining elements to obtain anunambiguous imaging of the tonal values in a discrete depth plane (Zcoordinate). What normally follows now is the projection of the stackstructure onto the desired views. At least two views should be created,one of which might still be equal to one of the input images. However,this is done, as a rule, with the particular 3D image display device inmind that is used thereafter. The subsequent combination of thedifferent views provided corresponds to the parameter assignment of the3D display.

Once the stack structure has been created, or following the method stepsin accordance with the invention, the depth is determined for at leastthree original images of the n-tuple, preferably in the form of depthmaps. Preferably, at least two depth maps having different resolutionare created.

Further, after the original images of the n-tuple and the respectiveassociated depths have been taken over, preferably a reconstruction isperformed by inverse projection of the images of the n-tuple into thestack space by means of depth maps, so that the stack structure isreconstructed, and so afterwards again different views can be generatedtherefrom by projection. Other methods of creating the views from theimage data provided (n-tuples of images, depth information) are alsopossible.

Moreover, the original images of the n-tuple with the respectiveassociated depths can be transmitted to the 3D image display device andthen the reconstruction in accordance with the inventive method can bedone first.

In general, the images of the n-tuple are created, e.g., by means of a3D camera system, e.g. a multiple camera system consisting of severalseparate cameras.

Alternatively it is possible, in the method described above for therecording and display of images of a scene and/or an object, to createthe images by means of a computer. In this case, preferably a depth mapis created for each image, so that the rectification, color adjustmentand depth or disparity recognition steps can be dropped. Preferably, atleast two of the three depth maps differ in resolution. In a preferredembodiment, n=3 images may be provided, one of which has the(full-color) resolution of 1920×1080 pixels and the other two have the(full-color) resolution of 1280×720 (or 1024×768) pixels, whereas theappertaining depth maps have 960×540 and 640×360 (or 512×384) pixels,respectively. The image having the higher resolution corresponds, inspatial terms, to a perspective view lying between the perspective viewsof the other two images.

The 3D image display device employed can preferably display 2, 3, 4, 5,6, 7, 8, 9 or even more views simultaneously or at an average time. Itis particularly with such devices, known as “multi-view” 3D imagedisplay devices with at least 4 or more views displayed, that thespecial advantages of the invention take effect, namely, that withrelatively few (e.g. three) original images, more views can be providedfor spatial display than the number of original images. By the way, thecombination, mentioned farther above, of at least two different views orimages in accordance with the parameter assignment of the 3D display ofa 3D image display device for spatial presentation without specialviewing aids may contain a combination of views not only from differentpoints in space but in time also.

Another important advantage of the invention is the fact that, after theoptimization of the stack structure, the depth is determined peroriginal image. The resulting data have an extremely efficient datatransfer format, viz. as n images (e.g. original images, or views) plusn depth images (preferably with n=3), so that a data rate is achievedthat is markedly lower than that required if all views were transferred.As a consequence, a unit for the reconstruction of the stack structureand the unit for the projection of the stack structure onto the desiredview, or units of other kind that perform the reconstruction of viewsdifferently, have to be integrated into the 3D image display device.

For the steps mentioned above, it is possible to use disparity insteadof depth. By the way, the term “projection” here may, in principle, alsomean a mere displacement.

Of course, other depth or disparity recognition methods than the onedescribed before can be used to detect depth or disparities from then-tuple of images (with n>2), and/or to generate further views from thisn-tuple of images. Such alternative methods or partial methods aredescribed, for example, in the publications “Tao, H. and Sawhney, H.:Global matching criterion and color segmentation based stereo, in Proc.Workshop on the Application of Computer Vision (WACV2000), pp. 246-253,December 2000”, “M. Lin and C. Tomasi: Surfaces with occlusions fromlayered Stereo. Technical report, Stanford University, 2002. Inpreparation “C. Lawrence Zitnick, Sing Bing Kang, Matthew Uyttendaele,Simon Winder, Richard Szeliski: High-quality video view interpolationusing a layered representation, International Conference on ComputerGraphics and Interactive Techniques, ACM SIGGRAPH 2004, Los Angeles,Calif., pp: 600-608”, “S. M. Seitz and C. R. Dyer: View Morphing, Proc.SIGGRAPH 96, 1996, 21-30”.

By the method according to the invention, it is possible, in principle,that the images created are transferred to the image conversion device.Moreover, all views of each image, generated by the image conversiondevice can be transferred to the 3D image display device.

In an advantageous embodiment, the invention comprises a method for thetransmission of 3D information for the purpose of later display forspatial perception without special viewing aids, on the basis of atleast three different views, a method in which, starting from at leastone n-tuple of images (with n>2) characterizing different angles of viewof an object or a scene, with at least two images of the n-tuple havingdifferent resolutions, the depth is determined for at least threeimages, and thereafter at least three images of the n-tuple togetherwith the respective depth information (preferably in the form of depthmaps) are transmitted in a transmission channel.

In a preferred embodiment, the n-tuple of images is a quadruple ofimages (n=4), with preferably three images having the same resolutionand the forth one having a higher resolution, and with the fourth imagepreferably belonging to the images transmitted in the transmissionchannel, so that, for example, one high-resolution image and two of thelower-resolution images are transmitted together with the depthinformation.

Herein, at least two of the depth maps determined may differ inresolution. The depth information is determined only from images of then-tuple having the same resolution.

It is also possible that, from the depth information determined, thedepth is also generated for at least one image of higher resolution.

In a further development of the method according to the invention aswell as of the transmission method, the depth information determinedfrom images of the n-tuple having the lowest existing resolution can betransformed into a higher resolution by way of edge recognitions in theat least one image of higher resolution. This is helpful especially if,e.g., in the versions “1+2, “1+3” and “1+5” described before, thehigh-resolution main camera zooms in on various scene details and/orobjects, i.e. records at a magnification. In this case, there is noabsolute need to vary the zoom settings of the satellite cameras, too.Instead, the resolution of the corresponding depth information for themain camera is increased as described above, so that the desired viewscan be created with sufficient quality.

Further developments provide for a great number of n-tuples of imagesand associated depth information to be processed in succession, so thata spatial display of moving images is made possible. Finally in thiscase, it is also possible to perform a spatial and temporal filtering ofthe great number of n-tuples of images.

The transmission channel may be, e.g., a digital TV signal, the Internetor a DVD (HD, SD, BlueRay etc.). As a compression standard, MPEG-4 canbe used to advantage.

It is also of advantage if at least two of the three depth maps havedifferent resolutions. For example, in a preferred embodiment, n=3images may be provided, one of them having the (full-color) resolutionof 1920×1080 pixels, and two having the (full-color) resolution of1280×720 (or 1024×768) pixels, whereas the pertaining depth maps have960×540 and 640×360 (or 512×384) pixels, respectively. The image havingthe higher resolution corresponds, in spatial terms, to a perspectiveview lying between the perspective views of the other two images.

The 3D image display device employed can preferably display 2, 3, 4, 5,6, 7, 8, 9 or even more views simultaneously or at an average time.Especially in those mentioned last, known as “multi-view” 3D imagedisplay devices with 4 or more views displayed, the special advantagesof the invention take effect, viz. that with relatively few (e.g. three)original images, more views can be provided than the number of originalimages. The reconstruction from the n-tuple of images transmittedtogether with the respective depth information (with at least two imagesof the n-tuple having different resolutions) in different views isperformed, e.g., in the following way: In a three-dimensional coordinatesystem, the color information of each image—observed from a suitabledirection—are arranged in the depth positions marked by the respectivedepth information belonging to the image. This creates a coloredthree-dimensional volume with volume pixels (voxels), which can beimaged from different perspectives or directions by a virtual camera orby parallel projections. In this way, more than three views can beadvantageously regenerated from the information transmitted. Otherreconstruction algorithms for the views or images are possible as well.

Regardless of this, the information transmitted is reconstructible in ahighly universal way, e.g. as (perspective) views, tomographic sliceimages or voxels. Such image formats are of great advantage for special3D presentation methods, such as volume 3D displays.

Moreover, in all transmission versions proposed by this invention it ispossible to transmit meta-information, e.g. in a so-called alpha channelin addition. This may be information supplementing the images, such asgeometric conditions of the n>2 images (e.g., relative angles, cameraparameters), or transparency or contour information.

Finally, the problem of the invention can be solved by a method oftransmitting 3D information for the purpose of subsequent display forspatial perception without special viewing aids, on the basis of atleast three different views, whereby, starting from at least one n-tupleof images with n>2 that characterize different viewing angles of anobject or scene, the depth is determined for at least three images, andat least three images of the n-tuple together with the respective depthinformation (preferably in the form of depth maps) are subsequentlytransmitted in a transmission channel.

Preferably the n-tuple of images is a triple of images (n=3), with thethree images having the same resolution. It is also possible, however,that, e.g., n=5 or n=6 cameras generate 5 or 6 images each, so that thedepth information is determined from the quintuple or sixtuple ofimages, or at least from three images of them, and 3 of the 5 or 6images together with their depth maps are subsequently transmitted, evenwith the added possibility of a reduction of the resolution ofindividual images and/or depth maps.

Below, the invention is described in greater detail by exampleembodiments.

BRIEF DESCRIPTION OF THE DRAWINGS

The drawings show

FIG. 1: a sketch illustrating the principle of the arrangement accordingto the invention, with a main camera and three satellite cameras;

FIG. 2: a version with a main camera and two satellite cameras;

FIG. 3: a schematic illustration of the step-by-step displacement of twolines against one another, and generation of the Z coordinate;

FIG. 4: a scheme of optimization by elimination of ambiguities comparedto FIG. 3;

FIG. 5: a scheme of optimization by reduction of the elements to anunambiguous height profile curve, compared to FIG. 4;

FIG. 6: a schematic illustration of the step-by-step displacement ofthree lines against one another, and generation of the Z coordinate;

FIG. 7: a scheme of optimization by elimination of ambiguities comparedto FIG. 6;

FIG. 8: a scheme of optimization by reduction of the elements to anunambiguous height profile curve, compared to FIG. 7;

FIG. 9: a schematic illustration of a projection of a view from thescheme of optimization;

FIG. 10: a schematic illustration of an image combination of fourimages, suitable for spatial display without special viewing aids (stateof the art); and

FIG. 11: a schematic illustration of the transmission method accordingto the invention.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

An arrangement according to the invention essentially consists of a 3Dcamera system 1, an image conversion device 2 and a 3D image displaydevice 3. As shown in FIG. 1, the 3D camera system 1 contains threesatellite cameras 14, 15 and 16, one main camera 13; the imageconversion device 2 contains a rectification unit 21, a color adjustmentunit 22, a unit for establishing the stack structure 23, a unit for theoptimization of the stack structure 24, and a unit 25 for the projectionof the stack structure onto the desired view, and the 3D image displaydevice 3 contains an image combination unit 31 and a 3D display 32, withthe 3D display 32 displaying at least two views of a scene or object forspatial presentation. The 3D display 32 can also work on the basis of,say, 3, 4, 5, 6, 7, 8, 9 or even more views. As an example, a 3D display32 of model “Spatial View 19 inch” is eligible, which displays 5different views at a time.

FIG. 2 shows another arrangement according to the invention. Here, the3D camera system 1 contains a main camera 13, a first satellite camera14, and a second satellite camera 15. The image conversion device 2contains a rectification unit 21, a color adjustment unit 22, a unit forestablishing the stack structure 23, a unit for the optimization of thestack structure 24, a unit 25 for projecting the stack structure ontothe desired view, and a unit for determining the depth 26; and the 3Dimage display device 3 contains, as shown in FIG. 2, a unit for thereconstruction of the stack structure 30, an image combination unit 31,and a 3D display 32.

According to the embodiment shown in FIG. 2, the 3D camera system 1consists of a main camera 13 and two satellite cameras 14, 15, with themain camera 13 being a high-quality camera with a high resolving power,whereas the two satellite cameras 14, 15 are provided with a lowerresolving power. As usual, the camera positions relative to each otherare variable in spacing and alignment within the known limits, so thatstereoscopic images can be taken. In the rectification unit 21, thecamera images are rectified, i.e. a compensation of lens distortions,camera rotations, zoom differences, etc., is made. The rectificationunit 21 is followed by the color adjustment unit 22. Here, thetonal/brightness values of the recorded images are balanced to a commonlevel. The image data thus corrected are now fed to the unit 23 forestablishing the stack structure.

Now, in principle, a line-by-line comparison is made of the inputimages, but only those of the satellite cameras (14, 15 according toFIG. 2, or 14, 15, 16 according to FIG. 1). The comparison according toFIG. 3 is based on the comparison of only two lines each. In the firststep, at first two lines are placed one on top of the other with thesame Y coordinate, which, according to FIG. 3, corresponds to plane 0.The comparison is made pixel by pixel, and, as shown in FIG. 3, theresult of the comparison is saved as a Z coordinate in accordance withthe existing comparison plane, a process in which pixels lying on top ofeach other retain their tonal value if it is identical; if it is not, notonal value is saved. In the second step, the lines are displaced byincrements of ½ pixel each as shown in FIG. 3, with the pixel beingassigned to plane 1, or a next comparison is made in plane 1, the resultof which is saved in plane 1 (Z coordinate). As can be seen from FIG. 3,the comparisons are generally made up to plane 7 and then with plane −1up to plane −7, each being saved as a Z coordinate in the respectiveplane. The number of planes corresponds to the maximum depth informationoccurring, and may vary depending on the image content. Thethree-dimensional structure thus established with the XYZ coordinatesmeans that, for each pixel, the degree of relative displacement betweenthe views is saved via the appertaining Z coordinate. As shown in FIG.6, the same comparison is made on the basis of the embodiment shown inFIG. 1, save that three lines are compared here accordingly. A simplecomparison between FIG. 6 and FIG. 3 shows that the comparison of threelines involves substantially fewer misinterpretations. Thus, it is ofadvantage to do the comparison with more than two lines. The stackstructure established, which is distinguished also by the fact that nowthe input images are no longer present individually, is fed to thesubsequent unit 24 for optimization of the stack structure. Here,ambiguous depictions if image elements are identified with the aim todelete such errors due to improbable combinations, so that a correctedset of data is generated in accordance with FIG. 4 or FIG. 7. In thenext step, a height profile curve that is as shallow or smooth aspossible is established from the remaining elements in order to achievean unambiguous imaging of the tonal values in a discrete depth plane (Zcoordinate). The results are shown in FIG. 5 and FIG. 8, respectively.The result according to FIG. 5 is now fed to the unit 25 for theprojection of the stack structure onto the desired view as shown inFIG. 1. Here, the stack structure is projected onto a defined plane inthe space. The (i.e. each) desired view is generated via the angles ofthe plane, as can be seen in FIG. 9. As a rule, at least one view isgenerated that is not exactly equal to any of the images recorded by thecamera system 1. All views generated are present at the output port ofthe image conversion device 2 and can thus be transferred to thesubsequent 3D image display device 3 for stereoscopic presentation; bymeans of the image combination unit 31 incorporated, at first thedifferent views are combined in accordance with the given parameterassignment of the 3D display 32.

FIG. 2 illustrates another, optional way for transmitting the processeddata to the 3D image display device 3. Here, the unit 24 for theoptimization of the stack structure is followed by the unit 26 fordetermining the depth (broken line). Determining the depth of the imagescreates a particularly efficient data transfer format. This is becauseonly three images and three depth images are transferred, preferably inthe MPEG-4 format. According to FIG. 2, the 3D image display device 3 isprovided, on the input side, with a unit 30 for reconstructing the stackstructure, a subsequent image combination unit 31 and a 3D display 32.In the unit 30 for reconstructing the stack structure, the images anddepths received can be particularly efficiently reconverted into thestack structure by inverse projection, so that the stack structure canbe made available to the subsequent unit 25 for projecting the stackstructure onto the desired view. The further procedure is then identicalto the version illustrated in FIG. 1, save for the advantage that notall the views need to be transferred, especially if the unit 25 isintegrated in the 3D image display device 3. This last-named, optionalway can also be taken in the embodiment according to FIG. 1, providedthat the circumstances are matched accordingly.

For better understanding, FIG. 10 shows a schematic illustration of astate-of-the-art method (JP 08-331605) to create an image combination offour images or views, suitable for spatial presentation on a 3D displaywithout special viewing aids, for example on the basis of a suitablelenticular or barrier technology. For that purpose, the four images orviews have been combined in the image combination unit 31 in accordancewith the image combination structure suitable for the 3D display 32.

FIG. 11, finally, is a schematic illustration of the transmission methodaccording to the invention. In an MPEG-4 data stream, a total of 3 colorimages and 3 depth images (or streams of moving images accordingly) aretransmitted. To particular advantage, one of the color image streams hasa resolution of 1920×1080 pixels, whereas the other two have aresolution of 1280×720 (or 1024×768) pixels. Each of the appertainingdepth images (or depth image streams) is transmitted with half thehorizontal and half the vertical resolution, i.e. 960×540 pixels and640×360 (or 512×384) pixels, respectively. In the simplest case, thedepth images consist of gray-scale images, e.g. with 256 or 1024possible gray levels per pixel, with each gray level representing onedepth value.

In another embodiment, the highest-resolution color image would have,for example, 4096×4096 pixels, and the other color images would have2048×2048 or 1024×1024 pixels. The appertaining depth images (or depthimage streams) are transmitted with half the horizontal and half thevertical resolution. This version would be of advantage if the same datarecord is to be used for stereoscopic presentations of particularly highresolution (e.g. in the 3D movie theater with right and left images) aswell as for less well-resolved 3D presentation on 3D displays, but thenwith at least two views presented.

While the foregoing description and drawings represent the represent thepresent invention, invention, it will be obvious to those skilled in theart that various changes may be made therein without departing from thetrue spirit and scope of the present invention.

LIST OF REFERENCE NUMBERS

-   1 Camera system-   13 Main camera-   14 First satellite camera-   15 Second satellite camera-   16 Third satellite camera-   2 Image conversion device-   21 Rectification unit-   22 Color adjustment unit-   23 Unit for establishing the stack structure-   24 Unit for optimizing the stack structure-   26 Unit for projecting the stack structure onto the desired view-   26 Unit for determining the depth-   3 3D image display device-   30 Unit for reconstructing the stack structure-   31 Image combination unit-   32 3D display

1-39. (canceled)
 40. The arrangement for the recording of images of ascene and/or an object and their display for spatial perception,comprising: at least one main camera of a first camera type for therecording of images; at least two satellite cameras of a second cameratype for the recording of images, with the camera types differing in atleast one parameter; an image conversion device, arranged downstream ofthe cameras, that receives and processes the initial image data, saidimage conversion device performing, among other processes, a depth ordisparity recognition employing only those images recorded by cameras ofthe same camera type (by said at least two satellite cameras), but notthe remaining images; and a 3D image display device, connected to theimage conversion device, that displays the provided image data forspatial perception without special aids, with the 3D image displaydevice displaying at least two views.
 41. The arrangement as claimed inclaim 40, wherein the two camera types differ at least in the resolutionof the images to be recorded.
 42. The arrangement as claimed in claim40, wherein the two camera types differ at least in the built-in imagingchip.
 43. The arrangement as claimed in claim 40, wherein exactly onemain camera and two satellite cameras are provided.
 44. The arrangementas claimed in claim 40, wherein exactly one main camera and threesatellite cameras are provided.
 45. The arrangement as claimed in claim40, wherein exactly one main camera and five satellite cameras areprovided.
 46. The arrangement as claimed in claim 40, wherein the secondcamera type has a lower resolution than the first camera type.
 47. Thearrangement as claimed in claim 43, wherein the main camera is arrangedbetween the satellite cameras.
 48. The arrangement as claimed in claim40, wherein at least one partially transparent mirror is arranged infront of each of the objectives of the main camera and all satellitecameras.
 49. The arrangement as claimed in claim 44, wherein the centerpoints of the objectives of the three satellite cameras form a triangle.50. The arrangement as claimed in claim 49, wherein the triangle is aisosceles triangle.
 51. The arrangement as claimed in claim 49, whereinthe center point of the objective of the main camera is arranged insidesaid triangle, with the triangle to be understood to include its sides.52. The arrangement as claimed in claim 44, wherein one satellite cameraand the main camera are optically arranged relative to each other insuch a way that both record an image on essentially the same opticalaxis, for which purpose preferably at least one partially transparentmirror is arranged between the two cameras.
 53. The arrangement asclaimed in claim 52, wherein the two other satellite cameras arearranged to form a straight line or a triangle together with thesatellite camera associated to the main camera.
 54. The arrangement asclaimed in claim 40, wherein the image conversion device generates atleast two views of the scene or object recorded, and that, forgenerating these at least two views, the image conversion deviceemploys, besides the depth or disparity data recognized, the imagerecorded by the at least one main camera and at least one more imagerecorded by the satellite cameras, but not necessarily the images of allcameras provided.
 55. The arrangement as claimed in claim 54, whereinone of the at least three views generated is still equal to one of theinput images.
 56. The arrangement as claimed in claim 40, wherein themain camera or all main cameras, and all satellite cameras record withframe-accurate synchronization at a tolerance of maximally 100 framesper 24 hours.
 57. A method for the recording and display of images of ascene and/or an object, comprising the following steps: generating atleast one n-tuple of images, with n>2, with at least two images of then-tuple having different resolutions; transferring the image data to animage conversion device, in which then a rectification, a coloradjustment, a depth or disparity recognition and subsequent generationof further views from the n or less than n images of the said n-tupleand from the depth or disparity recognition data are carried out, withat least one view being generated that is not exactly equal to any ofthe n-tuple of images generated, and with the image conversion deviceemploying, for depth or disparity recognition, only such images of then-tuple that have the same resolution; subsequently generating acombination of at least two different views or images in accordance withthe parameter assignment of the 3D display of a 3D image display device,for spatial presentation without special aids; and finally presentingthe combined 3D image on the 3D display.
 58. The method as claimed inclaim 57, wherein, for the depth or disparity recognition, those imagesof equal resolution are employed whose resolution has the lowest totalnumber of pixels compared with all other resolutions provided.
 59. Themethod as claimed in claim 58, wherein, for depth recognition, a stackstructure is established by means of a line-by-line comparison of thepre-processed initial image data of an n-tuple, precisely, of thoseimages of the n-tuple only that have the same resolution, in such a waythat first those lines of the different images of an n-tuple which havethe same Y coordinate are placed in register on top of each other andthen a first comparison is made, the result of the comparison beingsaved in one line in such a way that equal tonal values in register aresaved, whereas different tonal values are deleted, which is followed bya displacement of the lines in opposite directions by specifiedincrements of preferably ¼ to 2 pixels, the results after each incrementbeing saved in further lines analogously to the first comparison; sothat, as a result after the comparisons made for each pixel, the Zcoordinate provides the information about the degree of displacement ofthe views relative to each other.
 60. The method as claimed in claim 59,wherein, after the establishment of the stack structure, an optimizationis made in such a way that ambiguities are eliminated, and/or areduction of the elements to an unambiguous height profile curve iscarried out.
 61. The method as claimed in claim 59, wherein, after theestablishment of the stack structure or after the steps described inclaim 60, the depth is determined for at least three original images ofthe n-tuple, preferably in the form of depth maps.
 62. The method asclaimed in claim 61, wherein, after transfer of the original images ofthe n-tuple and the respective depths appertaining to them, areconstruction is carried out by inverse projection of the views of then-tuple into the stack space by depth maps, so that die stack structureis reconstructed, and so that again different views can be subsequentlygenerated therefrom by projection.
 63. The method as claimed in claim57, wherein the images generated are transmitted to the image conversiondevice.
 64. The method as claimed in claim 57, wherein all viewsgenerated of each image by the image conversion device are transmittedto the 3D image display device.
 65. The method as claimed in claim 61,wherein the original images of the n-tuple with the respective depthsappertaining to them are transmitted to the 3D image display device,after which first the reconstruction according to claim 62 is carriedout.
 66. The method as claimed in claim 57, wherein the images of then-tuple are generated by a 3D camera system.
 67. The method as claimedin claim 57, wherein the images of the n-tuple are generated by acomputer.
 68. The method as claimed in claim 61, wherein at least twodepth maps differing in resolution are generated.
 69. A method for thetransmission of 3D information for the purpose of later display forspatial perception without special aids, on the basis of at least twodifferent views, comprising the steps of: proceeding from at least onen-tuple of images, with n>2, which characterize different angles of viewof an object or a scene, with at least two images of the n-tuple havingdifferent resolutions; determining the depth for at least three images;and thereafter, at least three images of the n-tuple, together with therespective depth information, are transmitted in a transmission channel.70. The method as claimed in claim 69, wherein the depth information isin the form of depth maps.
 71. The method as claimed in claim 69,wherein the n-tuple of images is a quadruple of images (n=4), with threeimages preferably having the same resolution, whereas the fourth imagehas a higher resolution and preferably belongs to the images transmittedin the transmission channel.
 72. The method as claimed in claim 69,wherein at least two of the three depth maps have different resolutions.73. The method as claimed in claim 69, wherein the image data and thedepth information are generated in the MPEG-4 format.
 74. The method asclaimed in claim 69, wherein the depth information is determined onlyfrom such images of the n-tuple that have the same resolution.
 75. Themethod as claimed in claim 74, wherein, from the depth informationdetermined, the depth also for at least one image of higher resolutionis generated.
 76. The method as claimed in claim 57, wherein depthinformation determined from images of the n-tuple that have the lowestresolution provided are transformed into a higher resolution by way ofedge recognitions in the at least one image of higher resolution. 77.The method as claimed in claim 57, wherein a great number of n-tuples ofimages and appertaining depth information are processed in succession,so that a spatial display of moving images is made possible.
 78. Themethod as claimed in claim 77, wherein the great number of n-tuples ofimages is subjected to spatial and temporal filtering.
 79. A method oftransmitting 3D information for the purpose of subsequent display forspatial perception without special aids, on the basis of at least twodifferent views, comprising the steps of: proceeding from at least onen-tuple of images with n>2, which characterize different viewing anglesof an object or a scene; determining the depth for at least threeimages; and thereafter at least three images of the n-tuple, togetherwith the respective depth information are transmitted, in a transmissionchannel.
 80. The method of claim 79, wherein the depth information is inthe form of depth maps.
 81. The method of claim 79, wherein the n-tupleof images is a triple of images (n=3), with the three images having thesame resolution.